217 research outputs found
Minimization via duality
We show how to use duality theory to construct minimized versions of a wide class of automata. We work out three cases in detail: (a variant of) ordinary automata, weighted automata and probabilistic automata. The basic idea is that instead of constructing a maximal quotient we go to the dual and look for a minimal subalgebra and then return to the original category. Duality ensures that the minimal subobject becomes the maximally quotiented object
Online Popularity and Topical Interests through the Lens of Instagram
Online socio-technical systems can be studied as proxy of the real world to
investigate human behavior and social interactions at scale. Here we focus on
Instagram, a media-sharing online platform whose popularity has been rising up
to gathering hundred millions users. Instagram exhibits a mixture of features
including social structure, social tagging and media sharing. The network of
social interactions among users models various dynamics including
follower/followee relations and users' communication by means of
posts/comments. Users can upload and tag media such as photos and pictures, and
they can "like" and comment each piece of information on the platform. In this
work we investigate three major aspects on our Instagram dataset: (i) the
structural characteristics of its network of heterogeneous interactions, to
unveil the emergence of self organization and topically-induced community
structure; (ii) the dynamics of content production and consumption, to
understand how global trends and popular users emerge; (iii) the behavior of
users labeling media with tags, to determine how they devote their attention
and to explore the variety of their topical interests. Our analysis provides
clues to understand human behavior dynamics on socio-technical systems,
specifically users and content popularity, the mechanisms of users'
interactions in online environments and how collective trends emerge from
individuals' topical interests.Comment: 11 pages, 11 figures, Proceedings of ACM Hypertext 201
Problematic Advertising and its Disparate Exposure on Facebook
Targeted advertising remains an important part of the free web browsing
experience, where advertisers' targeting and personalization algorithms
together find the most relevant audience for millions of ads every day.
However, given the wide use of advertising, this also enables using ads as a
vehicle for problematic content, such as scams or clickbait. Recent work that
explores people's sentiments toward online ads, and the impacts of these ads on
people's online experiences, has found evidence that online ads can indeed be
problematic. Further, there is the potential for personalization to aid the
delivery of such ads, even when the advertiser targets with low specificity. In
this paper, we study Facebook -- one of the internet's largest ad platforms --
and investigate key gaps in our understanding of problematic online
advertising: (a) What categories of ads do people find problematic? (b) Are
there disparities in the distribution of problematic ads to viewers? and if so,
(c) Who is responsible -- advertisers or advertising platforms? To answer these
questions, we empirically measure a diverse sample of user experiences with
Facebook ads via a 3-month longitudinal panel. We categorize over 32,000 ads
collected from this panel (); and survey participants' sentiments toward
their own ads to identify four categories of problematic ads. Statistically
modeling the distribution of problematic ads across demographics, we find that
older people and minority groups are especially likely to be shown such ads.
Further, given that 22% of problematic ads had no specific targeting from
advertisers, we infer that ad delivery algorithms (advertising platforms
themselves) played a significant role in the biased distribution of these ads.Comment: Accepted to USENIX Security 202
Large-scale diversity estimation through surname origin inference
The study of surnames as both linguistic and geographical markers of the past
has proven valuable in several research fields spanning from biology and
genetics to demography and social mobility. This article builds upon the
existing literature to conceive and develop a surname origin classifier based
on a data-driven typology. This enables us to explore a methodology to describe
large-scale estimates of the relative diversity of social groups, especially
when such data is scarcely available. We subsequently analyze the
representativeness of surname origins for 15 socio-professional groups in
France
Understanding the Role of Registrars in DNSSEC Deployment
The Domain Name System (DNS) provides a scalable, flexible name resolution service. Unfortunately, its unauthenticated architecture has become the basis for many security attacks. To address this, DNS Security Extensions (DNSSEC) were introduced in 1997. DNSSECâs deployment requires support from the top-level domain (TLD) registries and registrars, as well as participation by the organization that serves as the DNS operator. Unfortunately, DNSSEC has seen poor deployment thus far: despite being proposed nearly two decades ago, only 1% of .com, .net, and .org domains are properly signed. In this paper, we investigate the underlying reasons why DNSSEC adoption has been remarkably slow. We focus on registrars, as most TLD registries already support DNSSEC and registrars often serve as DNS operators for their customers. Our study uses large-scale, longitudinal DNS measurements to study DNSSEC adoption, coupled with experiences collected by trying to deploy DNSSEC on domains we purchased from leading domain name registrars and resellers. Overall, we find that a select few registrars are responsible for the (small) DNSSEC deployment today, and that many leading registrars do not support DNSSEC at all, or require customers to take cumbersome steps to deploy DNSSEC. Further frustrating deployment, many of the mechanisms for conveying DNSSEC information to registrars are error-prone or present security vulnerabilities. Finally, we find that using DNSSEC with third-party DNS operators such as Cloudflare requires the domain owner to take a number of steps that 40% of domain owners do not complete. Having identified several operational challenges for full DNSSEC deployment, we make recommendations to improve adoption
Arrows, like Monads, are Monoids
Monads are by now well-established as programming construct in functional languages. Recently, the notion of âArrow â was introduced by Hughes as an extension, not with one, but with two type parameters. At first, these Arrows may look somewhat arbitrary. Here we show that they are categorically fairly civilised, by showing that they correspond to monoids in suitable subcategories of bifunctors C op ĂC â C. This shows that, at a suitable level of abstraction, arrows are like monads â which are monoids in categories of functors C â C. Freyd categories have been introduced by Power and Robinson to model computational effects, well before Hughes â Arrows appeared. It is often claimed (informally) that Arrows are simply Freyd categories. We shall make this claim precise by showing how monoids in categories of bifunctors exactly correspond to Freyd categories
Characterising Probabilistic Processes Logically
In this paper we work on (bi)simulation semantics of processes that exhibit
both nondeterministic and probabilistic behaviour. We propose a probabilistic
extension of the modal mu-calculus and show how to derive characteristic
formulae for various simulation-like preorders over finite-state processes
without divergence. In addition, we show that even without the fixpoint
operators this probabilistic mu-calculus can be used to characterise these
behavioural relations in the sense that two states are equivalent if and only
if they satisfy the same set of formulae.Comment: 18 page
Escaping the Big Brother: an empirical study on factors influencing identification and information leakage on the Web
This paper presents a study on factors that may increase the risks of personal information leakage, due to the possibility of connecting user profiles that are not explicitly linked together. First, we introduce a technique for user identification based on cross-site checking and linking of user attributes. Then, we describe the experimental evaluation of the identification technique both on a real setting and on an online sample, showing its accuracy to discover unknown personal data. Finally, we combine the results on the accuracy of identification with the results of a questionnaire completed by the same subjects who performed the test on the real setting. The aim of the study was to discover possible factors that make users vulnerable to this kind of techniques. We found out that the number of social networks used, their features and especially the amount of profiles abandoned and forgotten by the user are factors that increase the likelihood of identification and the privacy risks
It's not stealing if you need it: A panel on the ethics of performing research using public data of illicit origin
In a world where sensitive data can be published to a worldwide audience with the press of a button, researchers are increasingly making use of datasets that were publicized under questionable circumstances. In many cases, such research would otherwise not
- âŠ